Search results

1 – 1 of 1

View access options

Article

Publication date: 27 July 2022

Framework for entity extraction with verification: application to inference of data set usage in research publications

Svetlozar Nestorov, Dinko Bačić, Nenad Jukić and Mary Malliaris

The purpose of this paper is to propose an extensible framework for extracting data set usage from research articles.

HTML

PDF (1.4 MB)

Downloads

Abstract

Purpose

The purpose of this paper is to propose an extensible framework for extracting data set usage from research articles.

Design/methodology/approach

The framework uses a training set of manually labeled examples to identify word features surrounding data set usage references. Using the word features and general entity identifiers, candidate data sets are extracted and scored separately at the sentence and document levels. Finally, the extracted data set references can be verified by the authors using a web-based verification module.

Findings

This paper successfully addresses a significant gap in entity extraction literature by focusing on data set extraction. In the process, this paper: identified an entity-extraction scenario with specific characteristics that enable a multiphase approach, including a feasible author-verification step; defined the search space for word feature identification; defined scoring functions for sentences and documents; and designed a simple web-based author verification step. The framework is successfully tested on 178 articles authored by researchers from a large research organization.

Originality/value

Whereas previous approaches focused on completely automated large-scale entity recognition from text snippets, the proposed framework is designed for a longer, high-quality text, such as a research publication. The framework includes a verification module that enables the request validation of the discovered entities by the authors of the research publications. This module shares some similarities with general crowdsourcing approaches, but the target scenario increases the likelihood of meaningful author participation.

Details

The Electronic Library , vol. 40 no. 4

Type: Research Article

DOI:

ISSN: 0264-0473

Keywords

Access

Year

All dates (1)

Content type

Article (1)

1 – 1 of 1

Search results

Framework for entity extraction with verification: application to inference of data set usage in research publications

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Access

Year

Content type

Something didn’t work…

All feedback is valuable

Platform update page

Questions & More Information

Framework for entity extraction with verification: application to inference of data set usage in research publications

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Access

Year

Content type

We’re listening — tell us what you think

Something didn’t work…

All feedback is valuable

Join us on our journey

Platform update page

Questions & More Information